How Does Word Length Evolve in Written Chinese?
نویسندگان
چکیده
We demonstrate a substantial evidence that the word length can be an essential lexical structural feature for word evolution in written Chinese. The data used in this study are diachronic Chinese short narrative texts with a time span of over 2000-years. We show that the increase of word length is an essential regularity in word evolution. On the one hand, word frequency is found to depend on word length, and their relation is in line with the Power law function y = ax-b. On the other hand, our deeper analyses show that the increase of word length results in the simplification in characters for balance in written Chinese. Moreover, the correspondence between written and spoken Chinese is discussed. We conclude that the disyllabic trend may account for the increase of word length, and its impacts can be explained in "the principle of least effort".
منابع مشابه
How does language change as a lexical network? An investigation based on written Chinese word co-occurrence networks
Language is a complex adaptive system, but how does it change? For investigating this process, four diachronic Chinese word co-occurrence networks have been built based on texts that were written during the last 2,000 years. By comparing the network indicators that are associated with the hierarchical features in language networks, we learn that the hierarchy of Chinese lexical networks has ind...
متن کاملApproaching the Chinese Word Segmentation Problem with CHR Grammars
Written Chinese text does not include separators between words, as do European languages using space characters, and this creates the Chinese Word Segmentation Problem: given a text in Chinese, divide it in a correct way into segments corresponding to words. Correctness means how a competent Chinese language user would do this. CHR Grammars (CHRG) is an implemented grammar system that allows hi...
متن کاملPhonological codes as early sources of constraint in Chinese word identification: A review of current discoveries and theoretical accounts
A written Chinese character has a more direct connection with its meaning than a written word in English does. Moreover, because there is no unit in the writing system that encodes single phonemes, grapheme-phoneme mappings are impossible. These unique features have led some researchers to speculate that phonological processing does not occur in visual identification of Chinese words or that me...
متن کاملBilingualism, Biliteracy and Metalinguistic Awareness: Word Awareness in English and Japanese Users of Chinese as a Second Language
Cross-linguistic research shows that some aspects of metalinguistic awareness are affected by characteristics of different writing systems. Users of writing systems that mark word boundaries (such as English) develop word awareness, while users of unspaced writing systems (such as Chinese) do not. Previous research showed that English-speaking users of Chinese as a Second Language (CSL) have hi...
متن کاملCan MDL Improve Unsupervised Chinese Word Segmentation?
It is often assumed that MinimumDescription Length (MDL) is a good criterion for unsupervised word segmentation. In this paper, we introduce a new approach to unsupervised word segmentation of Mandarin Chinese, that leads to segmentations whose Description Length is lower than what can be obtained using other algorithms previously proposed in the literature. Suprisingly, we show that this lower...
متن کامل